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Abstract. The undesired interaction of a quantum system with its environment 
generally leads to a coherence decay of superposition states in time. A precise 
knowledge of the spectral content of the noise induced by the environment is crucial to 
protect qubit coherence and optimize its employment in quantum device applications. 
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We experimentally show that the use of neural networks can highly increase the 
accuracy of noise spectroscopy, by reconstructing the power spectral density that 
characterizes an ensemble of carbon impurities around a nitrogen-vacancy (NV) center 


e 
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in diamond. Neural networks are trained over spin coherence functions of the NV center 
subjected to different Carr-Purcell sequences, typically used for dynamical decoupling 
(DD). As a result, we determine that deep learning models can be more accurate than 


arXiv 


standard DD noise-spectroscopy techniques, by requiring at the same time a much 
smaller number of DD sequences. 


Keywords: Deep learning, Neural networks, Machine learning, Quantum machine 
learning, Quantum noise, Quantum sensing, Quantum noise spectroscopy. 
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1. Introduction 


Quantum sensing combines theoretical results with experimental and engineering 
techniques to carry out inference of signals with improved accuracy and/or less 
computation time by making use of quantum physics [1, 2]. 

A quantum sensor takes advantage of the fragility of its quantum properties, such as 
quantum coherence or entanglement, to improve the detection of external perturbations 
with higher accuracy compared to any classic sensor. However, this same property 
implies that the quantum sensor is subjected to detrimental noise stemming from the 
coupling with its environment. For this reason, it is desirable to fully characterize the 
sensor’s environment, either to filter out its detrimental effect, or to take it into account 
when detecting external signals, for example, in algorithms using quantum optimal 
control [3, 4, 5, 6, 7]. 

Neural networks (NNs) [8, 9], i.e., algorithmic models provided by the 
interconnection of a group of nodes commonly called neurons, could be a powerful tool to 
infer the sensor’s environment. In this context, deep learning has been already proposed 
theoretically for the classification and detection of quantum noise features [10, 11, 12], 
and employed experimentally for the following tasks. (a) Estimating the spectra of 
minuscule amounts of complex molecules [13] for nano nuclear magnetic resonance; (b) 
the sensing of magnetic-field strength at room temperature with high precision [14, 15] 
by using nitrogen vacancy (NV) centers; (c) performing error mitigation [16] and noise 
learning [17, 18, 19]; (d) the tracking of quantum trajectories [20]; (e) classification 
of many-body quantum states [21] in superconducting quantum circuits; (f) improving 
quantum error correction [22]. Also quantum neural networks are recently investigated 
in order to solve a given quantum technology task with a greater accuracy than classical 
NNs [23, 24, 25]. However, to our knowledge, experimental noise spectroscopy in single 
color centers in diamond via deep learning is still missing. 

In this paper, we demonstrate that NNs can be used to process the data obtained 
by a qubit, operating as a quantum sensor, and then reconstruct the noise spectrum 
that induces dephasing into the qubit itself. In particular, we focus on a qubit under 
dynamical decoupling (DD) control sequences [26, 27] in the presence of classical random 
noise with an unknown power density spectrum, usually denoted as noise spectral 
density (NSD). Beyond testing numerically our machine learning models, we use a 
single NV center in diamond as a spin qubit sensor and we perform a spectroscopic 
reconstruction of the magnetic noise of its local environment. The latter comprises 
13C nuclear spins randomly distributed in the diamond lattice [28, 29, 30] (see Fig. 1). 
The dephasing affecting the qubit sensor is analyzed by applying a set of DD control 
pulses that realize filter functions [26, 27, 31, 32] in the frequency domain. The filter 
functions are designed to select specific noise components, without sensing all other 
system-bath interactions. A widely used DD control pulse is the Carr-Purcell (CP) 
sequence [33, 1] that is given by N equidistant 7 pulses, performed between an initial 
and a final 7/2 pulse. CP sequences act in the frequency domain approximately as 
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Figure 1: NV center and Neural Networks for noise spectroscopy. The 


NV center is surrounded by an ensemble of '8C nuclear spins (orange spheres) that 
collectively induce dephasing to the NV electronic spin (blue sphere). The NV electronic 
spin is controlled with a DD sequence (specifically, a Carr-Purcell (CP) sequence) with 
the aim to measure its dephasing, and therefore characterize the NSD of the nuclear 
spin bath, i.e., S(w; so, A,a). The CP sequence is formed by N equidistant 7 pulses in 
between an initial and a final 7/2 pulse. The time 7 between the 7 pulses determines the 
measurement total time T = Nr, given that the time between the first 7/2 and the train 
of m pulse and the time between the last 7 and 7/2 pulses are both equal to 7/2. Then, 
3(1+C(r,N)) 
that the NV center remains in the initial state |0}. The spin coherence function C(r, N) 
stm} is 
For illustrative purposes, here we 


we measure the output of this experiment, which is the probability P = 


— evaluated at previously-determined inter-pulse times in the set T € {1%|,72,... 
the input of the designed Neural Networks (NNs). 
only consider one fixed value of N. In our study we also consider a set of different values 
of N [Sec. 2.3]. 

After being trained, the NNs return the estimation of the NSD parameters. 


Dirac comb filters [34]; hence, they have been used to perform spectroscopy of intricate 
signals, e.g., for noise spectroscopy [35, 36]. 
achieve high values of the noise reconstruction accuracy is to perform sequences with 
a high number of pulses meaning N € [30,120] (as in Ref. [37]) or higher, so that 
the Dirac comb filter approximation remains valid (in fact, N determines the filter 
width). 
of the noise. 


With this protocol, the requirement to 


This usually leads to long experiments to reconstruct the whole spectrum 
Other techniques using non-equidistant or even more sophisticated DD 
sequences [4, 38, 39, 40, 41] have proved to be effective for noise sensing, but sometimes 
at the price of a higher computational burden. 

For our sensing task, NNs are designed to solve a regression problem, i.e., the 
reconstruction of the NSD. Here, we assume that the NSD of the bath of spins has a 
Gaussian profile [37, 42, 43]. 
of key parameters, i.e., the mean value, variance, offset and noise power that we aim 


The Gaussian NSD is thus parametrized as a function 
to reconstruct. Note that our proposal can be adapted to other parametrized NSD 
functions. The NNs are trained over a set of synthetic data generated by simulating 


how the coherence of the qubit sensor decays over time under the influence of both the 
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CP control pulses and the NSD. Moreover, to make the measurement statistics as close 
as possible to the ones obtained from the experiments, extra artificial errors sampled 
from a normal distribution are added. 

Our approach using NNs entails the following advantages that we have proven 
experimentally. (i) NNs have the capability to predict never-before-seen experimental 
data, and they can work with a better reconstruction accuracy (even up to 7 times 
better, as shown in the section Results below) than standard noise spectroscopy, as the 
Alvarez-Suter method [36], by making use at the same time of DD control sequence 
with a much smaller number of pulses. (i) The training dataset, which can contain 
both synthetic and experimental data, is generated just once and then it can be applied 
several times, as long as the new collected data reproduce the physical context under 
analysis. In connection with (i), we are going to show that the amount of data used as 
input to the NNs can be smaller than the one needed to resolve the NSD by means of 
standard noise spectroscopy methods. 

From our knowledge, this work is the first experimental proof of enhanced 
reconstruction performance with NNs for carrying out noise spectroscopy in single color 
centers in diamond. We thus expect that the techniques discussed here could fast become 
a novel standard spectroscopy tool both for such quantum systems and other quantum 
platforms in which regression problems have to be solved. 


2. Results 


2.1. Generation of training dataset 


The training dataset is composed of synthetic data that are originated by simulating the 
coherence decay of the qubit sensor in a noise spectroscopy experiment based on DD, as 
the one depicted in Fig 1. This standard sensing procedure, which stems from Ramsey 
interferometry [1], maps information about the quantum coherence of the sensor into 
the population in |0) that is then effectively recorded. After having initialized the qubit 
sensor in the ground state |0), a 7/2 pulse is applied such that the qubit state |w) is 
the superposition (|0) + |1))//2. Then, we perform a CP control sequence consisting 
in a train of m pulses that flips repeatedly the qubit, and finally, a second 7/2 pulse is 
applied in order to map the phase of the qubit into its population. The probability that 
the state of the quantum sensor is |0}, which corresponds to the observable population, 
equals to [1, 37] 


P= = (1+C(r,N)), (1) 


where N is the number of 7 pulses and 7 is the time between them. The coherence 
function C(7, N) is simulated numerically, for a set of different values of r and N, to 
generate the training dataset. 

Let us now introduce the decoherence function that quantifies how the quantum 
coherence C (r, N) is modified under the action of both the external bath of spins and a 
set of CP control pulses. The control sequence has the effect to modulate the coherence 
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content of the qubit sensor, while the interaction with the bath, associated to the NSD 
S(w), tends on average to destroy such coherence. Overall, under the joint presence 
of control fields and a noise source, the coherence decays as C(t, N) = e~*(™%), where 
X(T, N) denotes the decoherence function [44, 45, 32, 46]: 


MEANS J Eon, N)S(w). (2) 


In Eq. (2), the filter function F(w,T, N) = |Y(w,7, N)|? is the square modulus of the 
Fourier transform of the so-called modulation function y(t,7, N). The latter is constant 
piecewise, with values +1, and switches sign at the times t = 7/2,37/2,...,(N —1/2)r 
where each 7 pulse is applied [2]. Notice that we are assuming that the m pulses are 
instantaneous, a reasonable assumption for our experimental setup where a 7 pulse 
duration is ~ 0.1 us and the time between pulses is 7 € [3.3,6.1] us. Let us now recall 
the expression, in the frequency domain, of the filter function for a CP sequence with 


N 
F(w,T, N) = 8sin? (==) sec? (=) siní (=) : (3) 


while for odd N, sin?(w7N/2) has to be replaced with cos?(w7N/2) [31, 2]. 
In order to generate the training dataset, the NSD S(w) is parameterized as 


$(w) = so + Aexp (-“*) | (4) 


even N: 


20? 

Thus, being a Gaussian distribution, the NSD is fully described by the offset so, 
amplitude A, width o and center we. For the training dataset in the paper, the values 
of these parameters are taken from the following intervals: sọ € [4-10~*,4- 1073] MHz; 
A € [0.3,0.7] MHz; o € [2 -1078,9 - 107°] MHz. Instead, we is kept constant. This 
is because in our experimental setup the NSD stems from the interaction with a large 
ensemble of unresolved !3C impurities (nuclear spin bath) around the NV electronic 
spin. Therefore, the center of the NSD corresponds to the Larmor frequency we = yB, 
where y = 1.0705 kHz/G is the gyromagnetic ratio of the 13C nuclear spins, and B is 
the amplitude of a static magnetic field aligned with the NV quantization axis, z. Such 
static magnetic field is well known during the experimental procedure since it determines 
the NV electronic spin resonances (B = 403.2 + 2 G). 

The training dataset is generated by uniformly sampling 104 sets of parameters 
within the chosen intervals. Hence, overall we consider 10* distinct sequences of 
NSD parameters that are used to simulate different coherence curves C(7, N). These 
sequences are taken in the time intervals r € [3.3,3.66] us and [5.5,6.1] us with 
sampling time Ar = 1 ns (Ar = 20 ns in the experimental case, see below), and 
for N = {1,8, 16, 24,32, 40,48}. These intervals are significant for our study because 
they include the values of r at which the coherence decay curve exhibits the first and 
second order collapses induced on the qubit sensor by the bath of 13°C impurities (for 
the coherence curves, the first and second order of the collapses refer to the harmonics 
of the filter functions F(w,rT, N), for more details see Ref. [37]). Finally, in order to 
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make the synthetic data used to train the NNs closer to the experimental setting, 
extra artificial errors sampled from a normal distribution with zero expected value 
and standard deviation equal to 0.05 (comparable with the expected error in our 
experimental measurements) are added to every point of the generated coherence decay 
curves. In this way, one may mitigate the over-fitting of the employed machine learning 
models that are thus expected to better generalize to unseen data. In general, a model 
trained on synthetic data cannot be successfully applied to real data without fine tuning 
it. But in our case, it becomes possible, probably due to the fact that the simulated 
data of the coherence decay are quite close to the experimentally observed decay data 
induced by the environment. 

As final remark, notice that, from the 10* simulated curves C(7, N), 6000 are 
used for the training of the NNs and 2000 for their validation. Instead, the test step is 
performed either by using the remaining 2000 simulated curves, or by using experimental 
data as described below. 


2.2. Neural networks working principles 


Let us describe the main working features of the NNs employed in this paper to carry 
out noise spectroscopy. Specifically, we are going to use the multi-layer perceptron 
(MLP) that is composed of fully-connected layers, each of them with a variable number 
of artificial neurons. 

A single artificial neuron returns as output the scalar 


gj =X(w! -x+b) (5) 


that, by definition, is provided by applying the non-linear function © : R — R to the 
weighted sum of the input vector x € R* to which the bias term b € R is added. w € R* 
denotes the vector of weights. In our analysis, the activation function © is chosen equal 
to the rectifier X(x) = max(0, x) [47, 48]. Thus, a MLP layer composed of q neurons 


(each with k inputs) returns the vector 


¥ = E(WTx + b), (6) 


where y € R1, W € R’*4 is the matrix of weights (W collects all the weight vectors of 
the single neurons), and b € R? is the vector of the biases. Hence, a MLP with L layers 
is ruled by the recursion equation 


h[¢] = £ (WA he — 1] + b[4) , (7) 


where Z = 1,..., L is the index over the number of layers and h[0] = x. In Eq. (7), 
W |¢] and b[¢] are, respectively, the weights and the biases of the ¢-th layer. The output 
vector of the MLP is y = h[Z]. It is worth noting that the number, dimension and 
activation functions (they are usually denoted as the hyperparameters €) of the NN 
layers are chosen through a single optimization routine (cfr. Methods). 
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Let us now introduce the supervised learning process. Ideally, the purpose of the 
latter is to find the parameters 0* = argmingRp(0,¢€) that minimize the theoretical risk 
function 


Ro), £) = E(x,y)~D L£ (y,y)| , (8) 
where 0 = {W][1], b[1],..., W[L], b[L]}, and y are the estimated values of y. By 
definition, Rp is the expected value of the loss function £ for (x,y) sampled from the 
distribution D that generates the dataset [49]. The loss function £ is a differentiable 
function that measures the distance between the prediction y (output of the MLP) 


and the desired output y. However, since one can only dispose of a finite set 
S = {(x,y)i,---,(%,y)m} of samples to train, validate and test the employed ML 
models, the theoretical risk function is approximated by the empirical risk function. 
Considering the partition {S;r, Sva; Ste} of S in training (Sır), validation (Sua) and test 
(Sie) sets, the empirical risk function is defined by: 


Rs,,(0,€) = 5 Ly (9) 


Bal tr (x y) )EStr 


where |S;,.| is the cardinality of the training set. In fact, Rs, is the arithmetic mean of 
the loss function £ evaluated on the samples of the training set Sir. 

In our paper, we take the loss function £ equal to the Mean Squared Error (MSE), 
also called L2 loss: 


Ly) ==) Qiy (10) 
i=1 

for the q outputs of the last layer (in our case three, corresponding to the noise 
parameters so, A, 7). The MLP is trained by minimizing (step-by-step over time) the 
empirical risk function Rs,,(0,€) with respect to 0 by means of the mini-batch gradient 
descent method, so as to obtain the optimal value 6* of the NN parameters. Each 
gradient descent step is defined by 


B 

biyi =O — nVor >, L(¥o,t,Yo,t); (11) 
where 0o is a randomly chosen starting point, 7) is the learning rate that defines the length 
of the step and Voz = 4 L(¥t0,Yzp) is the gradient of the loss function. The gradient 
is calculated for any time t on a batch of B elements taken from the training set, and 
the subscript 0 in Vg indicates that the variables of £ during the gradient evaluation 
are the weights of the NN. In this paper, Rsg,, is minimized by means of Adam [50] 
that is a gradient-based optimization algorithm performing the adaptive estimation 
of lower-order moments. The minimization is stopped when the time-derivative of the 
risk function evaluated on the validation set Rs,,(0*, €) becomes positive (early stopping 
strategy) or after a predefined number of gradient steps using all the data of the training 
set (called epochs). Then, we use Rs,,(8*, €) to check if the MLP works also for unseen 
data and tune the hyperparameters £ (cfr. Methods). Finally, the test set S;. is employed 
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Figure 2: Mean-square-errors (MSE) between original and estimated NSD parameters 
for a set of 2000 test cases. Orange bullets with dash-dotted line are the mean values 
returned by NNs. Blue squares with dotted line are the mean values provided by the 
HS method. Finally, shaded areas denote the standard deviation, taking into account 
all the 2000 cases. 


to calculate the metrics (discussed in detail below) used to generate the figures with the 
results that we are going to illustrate. 


2.3. Training and numerical test of neural networks 


We now show the results obtained by using the trained machine learning models to infer 
the value of the NSD parameters {so, A,a}. As already mentioned, the NNs are tested 
with 2000 different NSD parameters. For each of these sets of parameters, the curves 
C(t, N) have been simulated as described in the previous subsections. 

In order to determine the smallest amount of data required to reconstruct the 
NSD, we perform the training, validation and test of the NNs with sub-sets of the 
simulated curves. These sub-sets are defined by introducing the variable N that 
denotes the upper bound for the number of pulses N < N considered during the whole 
process. For example, for N = 16 only the curves C(r,N) with N € {1,8,16} are 
considered. Note that the sub-sets defined for each value of N contain the curves 
for all the different NSD parameters (6000 for training, 2000 for validation, and 2000 
for testing), and for all the times 7 in the intervals defined in section 2.1. In detail, 
the input of the neural network is defined as the concatenation of all the values of 
C(t, N), for 7 in the intervals defined before and N = 1,8,16,...,N. Specifically, 
x= 410, 1). C te, Des gO i 1), Gay 2), - -< , Cl 2); <- Ce N)}. 

The results of this analysis are shown in Fig. 2 (orange data), where the MSE (the 
loss function) between the inferred parameters (sọ, A, a) and the original parameters 
(so, A, o) used to generate the dataset is plotted as a function of N. Remarkably, the 
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MSE seems to achieve its minimum value after N = 16. This entails that the NNs do 
not significantly improve their precision on the reconstruction of the NSD by using more 
data to train them beyond this point. 

To establish how accurately a NN reconstructs the NSD, we need to compare the 
corresponding results with those of a different method. In particular, we concentrate 
on the method used in Ref. [37], which is itself based on Refs. [36, 35]. According to 
them, the decay of the coherence function C(7, N) is analyzed as a function of N, for 
each fixed value of 7;, i.e., for each fixed frequency component of the filter functions. 
In the limit of high N, the decay of the coherence is exponential, with a rate that is 
inversely proportional to the amplitude of the NSD [35]. In other words, the amplitude 
of the NSD is directly estimated for a discrete set of frequencies (each proportional to 
1/7). In contrast with the original proposals in Refs. [36, 35], the method in Ref [37] 
demonstrates that it is better to use the harmonics of the filter functions to reconstruct 
the NSD, in order to avoid extra broadening of the reconstructed spectrum. For this 
reason, we denote this method as Harmonics Spectroscopy (HS). 

We have analyzed the same 2000 different curves C(r7, N) (used to test the machine 
learning models) also with the HS method. The results are collected and shown in 
Fig. 2 (blue data), where the first point is for N = 16. This is due to the fact that, 
by definition, the HS method fits the decay of the coherence as a function of N. This 
is possible only for a dataset with at least three points (in this case N = 1,8,16). As 
one can observe in Fig. 2, the MSE values for the HS method (blue region) are always 
above the MSE values for the NN method (orange region), especially for lower values of 
N. These results demonstrate that the NN method can predict the parameters of the 
NSD with an improved accuracy (up to 5 times larger) with respect to the HS method. 
The test presented in this subsection have been performed with simulated data. In the 
next subsection we are going to repeat the same test but with experimental data. 


2.4. Experimental test of neural networks 


By this point we know that NNs can reliably predict the NSD from noisy simulated data. 
In this section, we want to use the NNs (trained and validated with noisy simulated data) 
to reconstruct the NSD using experimental data. 

As quantum sensor we use a spin qubit encoded in the electronic spin of the 
ground state of a single nitrogen-vacancy (NV) center in a bulk diamond at room 
temperature. This system has proven as a sensitive quantum probe of magnetic fields, 
with outstanding spacial resolution and sensitivity [51, 52]. The diamond sample in 
our experiments has a natural abundance of !°C impurities (1.1%) that are randomly 
distributed in the diamond lattice [28, 29, 30]. The '8C nuclear spins constitute 
the external environment of the NV center. They act as a collective bath of spins 
that induces dephasing into the NV electronic spin, limiting the its coherence time 
T> ~ 100 ps. In the presence of strong bias magnetic field (> 150G) [53, 37], the weak 
coupling of the NV spin with these carbon impurities can be modeled as a classical 
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Figure 3: (a) Coherence function C(7, N). The experimental data (blue bullets) are 
shown together with the simulated ones using the NSD predicted respectively by the 
HS method (red lines) and machine learning models (orange lines), both for N = 16. (b) 
Reduced chi-squared x2, obtained by comparing simulation and experimental data, as a 
function of N. As in panel (a), orange and red curves refer to the NN and HS method, 
respectively. Instead, the dashed line denotes the value of the reduced chi-squared for 
the HS method when we employ additional measurements for N = 56,64, 72,80 in the 
interval r € [5.5,6.1] us. Inset: Same results but quantified by the Mean-Absolute-Error 
(MAE) between the experimental data and the predicted C(r, N). 


stochastic field. The latter has a power spectrum density (here called NSD) that follows 
a Gaussian distribution centered at the Larmor frequency of the 13C nuclear spins. In 
order to measure the NV spin coherence function C(7, N), we apply a train of m pulses 
(in our case a CP sequence) to the NV spin qubit following the DD protocol described 
in Fig 1. For more details on the experimental implementation and Hamiltonian of the 
system see Ref. [37]. We have performed this experiment for N = {1, 8, 16, 24, 32, 40, 48}, 
and for 7 € [3.3, 3.66] us and [5.5,6.1] us with sampling time At = 20 ns. The results 
are shown in Fig.3(a) (blue bullets). Then, the collected coherence functions have been 
processed and employed to reconstruct the NSD parameters by means of both the NN 
(trained with the generated dataset) and the HS methods. In contrast with the test 
using simulated data in the previous section, in the experimental case we do not know 
the exact values of the NSD parameters. Therefore, we cannot calculate the MSE 
to quantify the accuracy of the reconstructed parameters. In order to estimate such 
accuracy we have used the following procedure: from the inferred NSD, the coherence 
curves C(7, N) are simulated and then compared with the experimental results. An 
example of this comparison is shown in Fig.3(a), where C(7, N) is simulated under the 
assumption that the NSD parameters are inferred either by the machine learning models 
(orange) or by the HS method (red), both for N = 16. Qualitatively it is clear that the 
orange curves are much closer to the experimental data, than the red curves. 

There are several options to quantitatively compare the experimental data and the 
simulation results. Here we use both the reduced chi-squared x? [54], and the Mean- 
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Absolute-Error (MAE) [55] between the experimental data and the predicted coherence 
functions C(r, N) (see Methods for more details). The results of this comparison are 
shown in Fig. 3(b), where x2 and the MAE are plotted as a function of N. Remarkably, 
the NSD reconstructed by the NN for N = 16 behaves better that any case using the 
HS method. It is worth observing that the same experimental data used to infer the 
NSD parameters are partially used to estimate the x2 and MAE(C(r, N)). For example, 
for N = 16, only the data for N = 1,8,16 are used to reconstruct the NSD, but we 
employ all the data N = 1,8,16,...,48 to obtain the x2 and MAE(C(r, N)). Overall, 
we have observed enhanced performance in reconstructing the NSD of the collective 
bath of spins, with a maximum improvement (about 7 times higher) for N = 16. In 
other words, for N = 16, once we reconstruct the NSD, the quantum sensor dynamics 
can be predicted with an average square deviation of ~ 1.86 experimental error-bars by 
using the NN method, or with an average square deviation of ~ 13 error-bars if we use 
the HS method. 


3. Discussion 


As shown pictorially in Fig. 1, the NN takes as input the spin qubit coherence functions 
(the coherence of the quantum sensor decays due to the presence of the external bath) 
obtained by using a set of different CP control sequences. The NN returns as output 
the parameters of the unknown NSD in the frequency domain. One can thus note that 
the NN, once validated, acts as a “time-frequency converter” (making use of a quite 
complicated deconvolution) from the measured signals living in the time domain — the 
spin coherence functions — to the NSD defined in the frequency domain. 

The results shown in the previous section, and summarized in Figs. 2 and 3(b), 
demonstrate that NNs can be used to reconstruct the NSD affecting a quantum sensor, 
achieving higher precision and with considerable less data than the standard HS method. 
Improved values of the reconstruction accuracy have been obtained with simulated and 
experimental data. Both the HS and NN methods are comparable — in terms of NSD 
reconstruction accuracy — for high values of N, but not for small ones, where NNs give 
significantly better results. Moreover, the main result of our study is that NNs trained 
with data obtained for N = 16 reconstruct the NSD more accurately than the best 
estimate provided by the HS method with N = 48. This improvement is remarkable 
by itself, but it becomes more significant when we consider that the time required to 
complete these experiments has a growth faster than a linear function with respect to 
N, following an arithmetic progression. As an example, the total time to perform all the 
experiments in the case of N = 16 and 48 is respectively ~ 10 minutes and œ 1.2 hours 
(for this estimation we consider 10° repetitions as in our experiments, we recall that the 
total time for each repetition of the single experiment is T = Nr). This is an under- 
estimation of the time difference between methods, because we are only considering the 
bare measurement time, without taking into account the time delay between different 
experiments. Furthermore, it is worth stressing that our results also show that deep 
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learning has a predictive power since it can be applied to never-before-seen data. This 
naturally provides to the employed machine learning models a connotation of robustness 
that is crucial in real applications. 

As a general comment, we stress that it is difficult to identify a definite reason why 
a machine learning model is more accurate (especially in the case of small N) than a 
standard DD technique for noise spectroscopy. As said above, we observe experimentally 
that the employed NNs are able to learn non trivial patterns in the sequences of input- 
output data. What the neural network learn is to invert C(7, NV) as a function of the 
noise parameters so, A and ø (see Eqs. (2) and (4) in the main text) that we aim to 
reconstruct. It is known that NNs are universal approximators of functions: this can be 
the reason why they are well suited to find the parameters of the NSD from C(r, N). 
Moreover, NNs do not consider approximations of the filter function, and they manage 
to find the noise parameters even for input data containing values of C(7, N) with N 
small. In contrast, the Alvarez-Suter method, as well as the HS methods in general, 
arises from approximating the filter function as a Dirac comb. This approximation is not 
valid for a small number of pulses, hence it is expected to poorly reconstruct the noise 
spectrum using standard DD techniques. In addition, reconstructing the parameters of 
a NSD using experimental data with a NN trained with synthetic data has been made 
possible by training the NN over an informative set of noise samples, used to generate 
the synthetic data. The latter, indeed, are given by a collection of values of C (r, N) that 
implicitly include a parameterization of the NSD that is reasonable for the experimental 
setting; in our case, a Gaussian distribution whose offset, amplitude, width and center 
belong to finite-valued intervals estimated from similar experimental conditions (e.g. 
Ref. [37]). 

Let us also observe that regression tasks, which are successfully solved by multi- 
layer perceptrons (one of the easiest form of NN), are less common with respect to the 
ones to carry out classification; a review of some example datasets and methods for 
regression is in Ref. [56]. Hence, we expect that the synthetic data used in this work 
could be useful as a test bed also to the audience of machine learning researchers and 
developers solving regression problems in different contexts. With this in mind, we share 
the training dataset with synthetic data and our codes for their generation, as well as 
the code for machine learning experiments and NSD reconstruction [available on the 
GitHub repository (see Section “Data and code availability” )]. In this way, we promote 
the improvement of machine learning models for noise sensing purposes and their use 
to solve different regression tasks in the quantum estimation framework. 


4. Conclusions 


In this paper, we use neural-networks (NNs) to carry out noise spectroscopy with a 
quantum sensor using dynamical decoupling sequences with a much smaller number of 
m pulses and, at the same time, achieving a higher reconstruction accuracy than standard 
methods (e.g., HS protocol). This means that with our proposal the noise spectroscopy 
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procedure will take less time and give better results. More in detail, we experimentally 
demonstrate the capability of NNs to reconstruct the NSD of the collective nuclear spin 
bath that surrounds an electronic spin qubit, i.e., the ground state of a single nitrogen- 
vacancy center in bulk diamond at room temperature. 

To conclude, we outline some possible outlooks for our work. First of all, one may 
evaluate the performance of NNs that are trained over input data obtained using DD 
control sequences with more degrees of freedom than the CP ones [57, 58, 59, 60, 61]. 
Secondly, deep learning might be applied to noise spectroscopy techniques beyond the 
HS methods, as for example optimal band-limited control protocols [39, 40] and even 
non-Gaussian noise characterization [62, 63, 64]. In this regard, notice that the NNs 
take as input the data associated to the spin coherence, and return as output the 
parameters of the noise spectral density. Therefore, a new NN for the characterization 
of spin qubit’s environment can be trained with coherence curves obtained from using 
any kind of coherent control sequences. The study of the performance of NNs trained 
with data from these more general control protocols is the next step in understanding 
how machine learning can enhance quantum sensing. In addition, it might be worth 
investigating how deep learning can be integrated to quantum sensing procedures that 
rely on the so-called stochastic quantum Zeno effect [65, 66], whereby the quantum probe 
is subjected to a sequence of quantum measurements that in the ideal case are designed 
to confine the dynamics of the probe around the initial (nominal) state [38, 67, 68]. We 
are also confident that the extent of our results can be quite easily replicated in other 
experimental settings, as e.g., superconducting flux qubits [69, 70], trapped ions [71, 72], 
cold atoms [73, 74], quantum dots [75, 76], NMR experiments in molecules [36, 77], and 
nanoelectronic devices [78]. For such a purpose, one might slightly adapt the deep 
learning techniques used here to methods tailored for time series. 


5. Methods 


5.1. Technical details on the training of NNs 


The NN models are developed using the PyTorch framework [79] on a machine with 32 
CPU cores, 126Gb of RAM and a GeForce RTX 3090 GPU. The training time, including 
the optimization of the hyperparameters, is around 12 hours for each N . 

The hyperparameters optimization is implemented by means of the Ray Tune 
library [80]. The Hyperopt package [81] uses the Tree-structured Parzen Estimators [82] 
algorithm as a Bayesian optimization to search for the best choice of the hyperparameters 
within a predefined search space. Hyperopt suggest the likely better configurations of 
the hyperparameters and the underlying model is updated after each trial that is run. 
The ASHA scheduler [83] is then used to stop the run of the least promising trials chosen 
by the search algorithm, thus speeding up the hyperparameters optimization process. 

The optimized hyperparameters are the following. (1) The number of hidden layers 
decides the value of L — 1 in Eq. (7). The hidden layers are between the input layer 
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h[0] and the output layer h[Z]. (2) The dimension of the hidden layers is the value of 
q in Eq. (6) that, for the sake of simplicity, is equal for all the layers in Eq. (7). Both 
the number and dimension of the hidden layers are chosen by sampling log-uniformly an 
integer value from the space [1,32) and [1, 1024), respectively. (3) The learning rate is 
responsible for the length of the gradient descent step and it is optimized with a choice 
between 10~?, 1073 and 1074. (4) The batch size denotes the dimension of the batch on 
which the loss function is summed for the gradient calculation in a single descent step. 
The batch size is chosen between 2, 4, 8, 16, 32. (5) The dropout is a regularization 
strategy that aims to reduce the overfitting by randomly turn off the NN neurons with 
a predefined probability. Such probability is one among 0 (no dropout), 0.2 and 0.5. (6) 
The weight decay is another regularization technique that adds to the loss function the 
squared weights of the NN multiplied by a decay value. The latter value is optimized 
choosing between 0 (no decay), 1076, 1075, 1074 and 107°. 

To facilitate the reproducibility of the experiments, we summarize in table 1 
the optimal values of the hyperparameters for the trained models. Each value of N 
defines the input size of the neural network. Therefore, a different optimization of the 
hyperparameters is performed for each case. 


Table 1: Hyperparameters for the employed machine learning models. For each value of 
N (that determines the size of the input layer) we report: the number of hidden layers 
(h. 1. num.), the dimension of each hidden layer (h. 1. dim.) and the values of learning 


rate (learning r.), batch size, dropout and weight decay (weight d.). 


N |h. l. num. h.l. dim. learning r. batch size dropout weight d. 
1 1 2 107? 16 0 1073 
8 5 328 1074 4 0 1074 
16 2 133 1073 8 0 1076 
24 3 224 1074 2 0 1074 
32 3 145 1074 4 0 1075 
40 3 286 1074 4 0 1074 
48 3 38 1073 8 0 1074 


5.2. Definition of quantifiers for reconstruction accuracy 


The accuracy of NN and HS methods can be estimated by using the reconstructed 
NSD to simulate the coherence function C (r, N), and ‘measuring’ the distance between 
the simulated data and the experimental values. To do so, we use the reduced chi- 
squared x2, and the Mean-Absolute-Error (MAE(C)): We define C, + dC. (Cs) as the 
experimental (simulated) values of C(7, N), where dC, is the standard deviation of the 
experimental data. Then we can write reduced chi-squared and the MAE as 
ea hy (Wale N= Calta WD) 
id oy OG Ta N)? 


(12) 
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MAE(C) = SE [Cela N) — Calta, N)|, (13) 


where N = {1,8,16,24,..., N}, {Tn} are the values of the time between pulses within 
the time intervals defined in main text, and v is the total number of elements in the 
sum. Notice that y? takes into account the experimental precision to scale the difference 


between experiment and simulation. The results showing both x2 and the MAE are in 
Fig. 3. 


Data and code availability 


The source codes for the generation of the training dataset and the ma- 
chine learning experiments are available on GitHub at the following address: 
https://github.com/trianam/noiseSpectroscopyNV 
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